DErivBase: Inducing and Evaluating a Derivational Morphology Resource for German
نویسندگان
چکیده
Derivational models are still an underresearched area in computational morphology. Even for German, a rather resourcerich language, there is a lack of largecoverage derivational knowledge. This paper describes a rule-based framework for inducing derivational families (i.e., clusters of lemmas in derivational relationships) and its application to create a highcoverage German resource, DERIVBASE, mapping over 280k lemmas into more than 17k non-singleton clusters. We focus on the rule component and a qualitative and quantitative evaluation. Our approach achieves up to 93% precision and 71% recall. We attribute the high precision to the fact that our rules are based on information from grammar books.
منابع مشابه
Dimensionality Reduction in Semantic Vector Spaces Using a Derivational Resource
Lexical semantic vector spaces model the meaning of words by representing the cooccurrence statistics of words in certain contexts. Words are considered similar if they occur in similar contexts, so that word similarity can get predicted by comparing their representation in the semantic vector space. Two major problems in those vector spaces are their size and their sparsity. Due to the charact...
متن کاملA Language-independent Approach to Extracting Derivational Relations from an Inflectional Lexicon
In this paper, we describe and evaluate an unsupervised method for acquiring pairs of lexical entries belonging to the same morphological family, i.e., derivationally related words, starting from a purely inflectional lexicon. Our approach relies on transformation rules that relate lexical entries with the one another, and which are automatically extracted from the inflected lexicon based on su...
متن کاملAre doggies cuter than dogs? Emotional valence and concreteness in German derivational morphology
The semantic behavior of derivational processes has been investigated with compositional distributional models relating the meaning of base, affix, and derivative (e.g., anti+capitalist→ anticapitalist). While broadly successful, these approaches model how the distributional behavior generally is affected by derivation. Meanwhile, their predictions can not be interpreted at the level of linguis...
متن کاملDerivBase.hr: A High-Coverage Derivational Morphology Resource for Croatian
Knowledge about derivational morphology has been proven useful for a number of natural language processing (NLP) tasks. We describe the construction and evaluation of DERIVBASE.HR, a large-coverage morphological resource for Croatian. DERIVBASE.HR groups 100k lemmas from web corpus hrWaC into 56k clusters of derivationally related lemmas, so-called derivational families. We focus on suffixal de...
متن کاملCroDeriV: a new resource for processing Croatian morphology
The paper deals with the processing of Croatian morphology and presents CroDeriV – a newly developed language resource that contains data about morphological structure and derivational relatedness of verbs in Croatian. In its present shape, CroDeriV contains 14 192 Croatian verbs. Verbs in CroDeriV are analyzed for morphemes and segmented into lexical, derivational and inflectional morphemes. T...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013